Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qip-0012: Qi UTXO Pruning #36

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wizeguyy
Copy link
Contributor

@wizeguyy wizeguyy commented Apr 2, 2024

No description provided.

We achieve this simply, by limiting the size of the UTXO set. If a transaction
creates new UTXOs which would exceed the UTXO set capacity, we destroy the
smallest UTXOs in the ledger. This is practical to do thanks to the fixed
denominations in the Qi ledger.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I find the smallest and oldest UTXO in the ledger? The UTXO trie is not organized in a FIFO manner (or any organization except for some key prefixing, as far as I can tell)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it will involve some filter routine on these events. As an optimization, we can consider indexing by denomination or something like that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some detail in the QIP regarding how this might be achieved? I was under the impression that indexing would be optional, not in consensus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well this is a different kind of indexing than the indexers used by the RPC. It could be implemented any number of ways, up to each implementation, so I don't want the QIP to say "this is the way its done". But for context, here are some ways it could be done:

Just-In-Time Scanning (not performant):

let mut denomination = MAX_DENOMINATION;
let mut delete_list = Vec::new();

// First scan and collect the keys of every UTXO to be deleted
for utxo in set {
    // Found a new smaller denomination. Reset the scanner.
    if utxo.denomination < denomination {
        denomination = utxo.denomination;
        delete_list.clear();
    }
    
    // If the utxo matches the smallest denomination, add it to the delete list
    if utxo.denomination == denomination {
        delete_list.push(utxo.key);
    }
}

// Now go back and delete each key you found
for key in delete_list {
    set.delete(key);
}

Keeping Denominations By Index:

struct UtxoSet {
    utxos: HashMap<UtxoKey, Utxo>,
    denominations: HashMap<Denomination, HashSet<UtxoKey>>,
}
impl UtxoSet {
    // Add a UTXO to the set, and prune the set if it gets too large
    fn AddUtxo(mut self, utxo: Utxo) {
        ... make sure its a valid utxo ...

        // Add to the UTXO set
        self.utxos.insert(utxo.key, utxo);

        // Add to the denomination index
        self.denominations[utxo.Denomination].insert(utxo.key)

        // Check if the set is too large, and trigger deletions
        if self.len() > UTXO_SET_CAPACITY {
            // Find the smallest denomination in the set
            let min_denomination = self.denominations
                .iter()                                 // iterate through the index lists
                .filter(|(den, list)| !list.is_empty()) // filter out any denominations which don't have existing UTXOs
                .map(|(den, _)| den)                    // just look at the denominations
                .min();                                 // get the smallest denomination
            
            // Delete every UTXO in the smallest denomination list
            for key in self.denominations.get(min_denomination) {
                self.DeleteUtxo(key);
            }
        }
    }

    // Delete a UTXO from the set
    fn DeleteUtxo(mut self, key: UtxoKey) {
        // Delete it from the set, and if it existed, delte it from the indexed lists
        if let Some(utxo) = self.utxos.remove(key) {
            self.denominations.get(utxo.denomination).remove(utxo.key);
        }
    }
}

The second requires more memory (effectively double the UTXO set), but takes very little time to prune the set.

There could be trade-off approaches, e.g. one which only indexes the keys of the smallest denomination, but the logic to get that right is beyond the scope of this thread, lol

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deletion should also take into account when the UTXO was created, right? i.e. the smallest and oldest, dustiest UTXOs are deleted first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps an ordered list should be maintained, organized by FIFO and denomination. It would have to be updated for each block, and perhaps even committed to in the header. Hopefully insertions are no worse than O(logn)...

Copy link
Contributor Author

@wizeguyy wizeguyy Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I just gave some quick examples, because you asked. There's a million ways to skin this cat

qip-0012.md Outdated Show resolved Hide resolved

We set a max trie depth size of 10, which corresponds to a max UTXO set size of
$16^10 \appox ~1 trillion$ UTXOs. If a transaction mints new UTXOs which exceed
the $16^10$ limit, the node shall prune all of the smallest UTXOs from the UTXO
Copy link

@jdowning100 jdowning100 Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long would it take to recompute the root of the trie with 1 trillion nodes/depth of 10? Average case for say an 8-core CPU? There's a max number of UTXOs that can be emitted and destroyed per block based on the block gas limit which gives some upper bound, I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the computation isnt the limit. Its usually the disk IOPS to read write each trie node that limits you. To strictly answer you question, here's some napkin math:

An 8c/16t 4GHz CPU, assuming keccak takes 15 cycles per byte (source wikipedia) and each trie node being up to 17*32 bytes:

17 * 32 * 15 = 8160 cycles / trie node hash
4GHz / 8160 = 490K tries/s per thread
490K tries/s * 16 threads = 7.8M tries/s
10 tries / 7.8M tries/s ~= 1.36us / root calculation

But, as I mentioned at the start, this is just the compute component. The dominant cost is actually the IOPS the disk can handle. A high end SSD tends to get around 45K IOPS, which equates to ~= 23us per disk access. At 10 trie levels, you need 460 us just to read 10 original nodes and write 10 new nodes, as well as 2x23us for the leaf node itself. Lets call it 500us to add a single UTXO to the trie. So, we could add 2K UTXOs per second to the trie at level 10.

That is the naive implementation. A good implementation will amortize some of those costs with batch operations, but that's beyond the scope of my napkin math. There's also some costs not accounted for here, e.g. time to look up and remove spent UTXOs from the set.

Copy link

@jdowning100 jdowning100 Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did 10 tries come from? I'm curious about the length of time to recompute the root of a PMT with a trillion elements. Shouldn't the computation use 1 trillion trie nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean 1 trillion leafs/accounts, not including all the intermediate trie nodes, right?

10 trie nodes is the number of levels in $log_{16}$ for a trie with 1 trillion leafs. An optimized PMT update algorithm (not recomputing from scratch) can update the root in $log_{16}$.

I thought you were asking for the CPU time, so I gave you some napkin math for that, but I realize now you are just asking for total recomputation time, which again is dominated by IOPS, not CPU performance.

There are a LOT of factors that could influence disk access performance (disk speed, how busy is the disk with other software?, database in-memory caching / paging strategies, etc), so its not reasonable to try and "napkin math" it here. You'd have to benchmark a particular implementation to get an idea.

@wizeguyy wizeguyy force-pushed the utxo-pruning branch 3 times, most recently from 980ebd2 to ca3ad3a Compare April 3, 2024 21:12
@wizeguyy wizeguyy changed the title Utxo pruning qip-0012: Qi UTXO Pruning Apr 4, 2024
## Specification
We achieve this simply, by limiting the size of the UTXO set. If a transaction
creates new UTXOs which would exceed the UTXO set capacity, we destroy the
smallest UTXOs in the ledger. This is practical to do thanks to the fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be smallest and oldest

@jdowning100
Copy link

jdowning100 commented Jul 23, 2024

Similarly to QIP 11, the gas cost to create a new UTXO should increase as the size of the set grows. Once we've hit the limit, it is sensible to destroy small and old UTXOs, but it should also be more expensive to create new ones. The total cost of a transaction can be offset by destroying UTXOs (by using them as inputs to the transaction).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants